Informatik Ensemble and Constrained Clustering with Applications
نویسندگان
چکیده
The main focus of this thesis concerns the further developments in the areas of ensemble and constrained clustering. The goal of the proposed methods is to address clustering problems, in which the optimal clustering method is unknown. Additionally, by means of pairwise linkage constraints, it is possible to aggregate extra information to the clustering framework. Part I investigates the concept of ensemble clustering. It presents a comprehensive review of the state of the art in ensemble clustering. It follows by discussing the impact of the ensemble variability in the final consensual result. Visualization of ensemble variability based on multidimensional scaling is also a topic addressed in this part. A software which is able to perform ensemble clustering using various existing consensus functions is also introduced. A consensus function based on random walker originally developed for image segmentation combination is adapted to the ensemble clustering problem. A lower bound is proposed to explore how well cluster ensemble methods perform in an absolute sense, without the usage of ground-truth. Finally, a study evaluating how well the general ensemble clustering techniques perform in the context of image segmentation combination closes this part. Part II introduces an ensemble clustering method based on a new formulation for the median partition problem. The performance of this method is assessed in relation to other well known ensemble clustering methods. Part III addresses the potential of ensemble techniques in the framework of constrained clustering. It presents a comprehensive review of the state of the art in constrained clustering and discusses the impact of considering constraints locally or globally. An experiment is presented comparing both approaches. A new clustering method is introduced combining both ensemble and constrained clustering. Constraints are introduced into three consensus functions. This part closes with an experimental evaluation, in which constraints are considered in different steps of the clustering ensemble framework.
منابع مشابه
Repeated Record Ordering for Constrained Size Clustering
One of the main techniques used in data mining is data clustering, which has many applications in computer science, biology, and social sciences. Constrained clustering is a type of clustering in which side information provided by the user is incorporated into current clustering algorithms. One of the well researched constrained clustering algorithms is called microaggregation. In a microaggreg...
متن کاملA new ensemble clustering method based on fuzzy cmeans clustering while maintaining diversity in ensemble
An ensemble clustering has been considered as one of the research approaches in data mining, pattern recognition, machine learning and artificial intelligence over the last decade. In clustering, the combination first produces several bases clustering, and then, for their aggregation, a function is used to create a final cluster that is as similar as possible to all the cluster bundles. The inp...
متن کاملThe ensemble clustering with maximize diversity using evolutionary optimization algorithms
Data clustering is one of the main steps in data mining, which is responsible for exploring hidden patterns in non-tagged data. Due to the complexity of the problem and the weakness of the basic clustering methods, most studies today are guided by clustering ensemble methods. Diversity in primary results is one of the most important factors that can affect the quality of the final results. Also...
متن کاملMLIFT: Enhancing Multi-label Classifier with Ensemble Feature Selection
Multi-label classification has gained significant attention during recent years, due to the increasing number of modern applications associated with multi-label data. Despite its short life, different approaches have been presented to solve the task of multi-label classification. LIFT is a multi-label classifier which utilizes a new strategy to multi-label learning by leveraging label-specific ...
متن کاملHigh-Dimensional Unsupervised Active Learning Method
In this work, a hierarchical ensemble of projected clustering algorithm for high-dimensional data is proposed. The basic concept of the algorithm is based on the active learning method (ALM) which is a fuzzy learning scheme, inspired by some behavioral features of human brain functionality. High-dimensional unsupervised active learning method (HUALM) is a clustering algorithm which blurs the da...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010